Parallel combination of speech streams for improved ASR
نویسندگان
چکیده
In a growing number of applications, such as simultaneous interpretation, audio or text may be available conveying the same information in different languages. These different views contain redundant information that can be explored to enhance the performance of speech and language processing applications. We propose a method that directly integrates ASR word graphs or lattices and phrase tables from an SMT system to combine such parallel speech data and improve ASR performance. We apply this technique to speeches from four European Parliament committees and obtain a 16.6% relative improvement (20.8% after a second iteration) in WER, when Portuguese and Spanish interpreted versions are combined with the original English speeches. Our results indicate that further improvements may be possible by including additional languages.
منابع مشابه
Stream selection and integration in multistream ASR using GMM-based performance monitoring
A moderately deep and rather wide artificial neural net is applied in phoneme recognition of noisy speech. The net is formed by first estimating posterior probabilities of phonemes in 21 band-limited streams covering the whole speech spectrum. These 21 band-limited streams are subdivided into three seven band-limited stream subsets, by differently sub-sampling the original 21 band-limited strea...
متن کاملCombining connectionist multi-band and full-band probability streams for speech recognition of natural numbers
Multi-band automatic speech recognition is a new and exploratory area of speech recognition which has been getting much attention in the research community. It has been shown that multiband ASR reduces word error in noisy conditions, particularly in the case of narrow band noise. In this work we show that multi-band ASR could be used to improve the speech recognition accuracy of natural numbers...
متن کاملThe full combination sub-bands approach to noise robust HMM/ANN based ASR
The performance of most ASR systems degrades rapidly with data mismatch relative to the data used in training. Under many realistic noise conditions a significant proportion of the spectral representation of a speech signal, which is highly redundant, remains uncorrupted. In the “missing feature” approach to this problem mismatching data is simply ignored, but the need to base recognition on un...
متن کاملFrom Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR
The multi-band processing paradigm for noise robust ASR was originally motivated by the observation that human recognition appears to be based on independent processing of separate frequency sub-bands, and also by “missing data” results which have shown that ASR can be made significantly more robust to band-limited noise if noisy sub-bands can be detected and then ignored. Of the different mult...
متن کاملMulti-stream ASR: an oracle perspective
Multi-stream based automatic speech recognition (ASR) systems are usually shown to outperform single stream systems, specially in noisy test conditions. And, indeed, there is a trend today in ASR towards using more and more acoustic features combined at the input (early integration, possibly preceded by some linear or nonlinear transformation) or later in the recognition process (e.g., at the l...
متن کامل